13 research outputs found

    Generation of Customized RISC-V Implementations

    Get PDF
    Processor customization has become increasingly important for achieving better performance and energy efficiency in embedded systems. However, customizing processors is time-consuming and error-prone work. The design effort is reduced by describing the processor architecture with high-level languages that are then used to generate the processor implementation. In addition to processor customization, open source hardware and standardization have become increasingly more popular. RISC-V that is a relatively new open standard instruction set architecture, has gained traction both in academia and industry. This thesis work added a RISC-V extension to the OpenASIP toolset that is developed at Tampere University. OpenASIP has wide support for customizing and generating transport triggered architectures. Transport triggered architectures have an exposed datapath that is visible to the programmer, which allows a lower level programming interface. The hardware generation and customization features in OpenASIP were reused by utilizing a transport triggered architecture as the internal microarchitecture together with a microcode unit. The extension generates the RISC-V implementations from an architecture description, which reduces the design effort of customizing the implementation. The RISC-V generator developed in this thesis has customization points for the bypass network, amount of pipeline stages, operation latencies and an optional addition of the standard M extension. The generator was evaluated by generating RISC-V cores with different customization points and comparing their performance and post-synthesis properties with open source implementations. The generated cores with bypass network achieved better performance while consuming slightly more area than the smallest reference design. The microcode hardware only utilized 3.6% of the design area and did not affect the maximum clock frequency

    OpenASIP 2.0 : Co-Design Toolset for RISC-V Application-Specific Instruction-Set Processors

    Get PDF
    Application-specific instruction-set processors (ASIPs) are interesting for improving performance or energy-efficiency for a set of applications of interest while supporting flexibility via compiler-supported programmability. In the past years, the open source hardware community has become extremely active, mainly fueled by the massive popularity of the open-standard RISC-V instruction set architecture. However, the community still lacks an open source ASIP co-design tool that supports rapid customization of RISC-V-based processors with an automatically retargetable programming toolchain. To this end, we introduce OpenASIP 2.0: A co-design toolset that is built on top of our earlier ASIP customization toolset work by extending it to support customization of RISC-V-based processors. It enables RTL generation as well as high-level language programming of RISC-V processors with custom instructions. In this paper, in addition to describing the toolset's key technical internals, we demonstrate it with customization cases for AES, CRC and SHA applications. With the example custom instructions easily integrated using the toolset, the run time was reduced by 44% on average compared to the standard RISC-V ISA. The speedups were achieved with a negligible datapath area overhead of 1.5%, and a 1.4% reduction in the maximum clock frequency.acceptedVersionPeer reviewe

    Programmable Dictionary Code Compression for Instruction Stream Energy Efficiency

    Get PDF
    We propose a novel instruction compression scheme based on fine-grained programmable dictionaries. In its core is a compile-time region-based control flow analysis to selectively update the dictionary contents at runtime, minimizing the update overheads, while maximizing the beneficial use of the dictionary slots. Unlike in the previous work, our approach selects regions of instructions to compress at compile time and changes dictionary contents in a fine-grained manner at runtime with the primary goal of reducing the energy footprint of the processor instruction stream. The proposed instruction compression scheme is evaluated using RISC-V as an example instruction set architecture. The energy savings are compared to an instruction scratch pad and a filter cache as the next level storage. The method reduces instruction stream energy consumption up to 21 % and 5.5 % on average when compared to the RISC-V C extension with a 1% runtime overhead and a negligible hardware overhead. The previous state-of-the-art programmable dictionary compression method provides a slightly better compression ratio, but induces about 30 % runtime overhead.acceptedVersionPeer reviewe

    Energy-Efficient Instruction Delivery in Embedded Systems with Domain Wall Memory

    Get PDF
    As performance and energy-efficiency improvements from technology scaling are slowing down, new technologies are being researched in hopes of disrupting results. Domain wall memory (DWM) is an emerging non-volatile technology that promises extreme data density, fast access times and low power consumption. However, DWM access time depends on the memory location distance from access ports, requiring expensive shifting. This causes overheads on performance and energy consumption. In this article, we implement our previously proposed shift-reducing instruction memory placement (SHRIMP) on a RISC-V core in RTL, provide the first thorough evaluation of the control logic required for DWM and SHRIMP and evaluate the effects on system energy and energy-efficiency. SHRIMP reduces the number of shifts by 36% on average compared to a linear placement in CHStone and Coremark benchmark suites when evaluated on the RISC-V processor system. The reduced shift amount leads to an average reduction of 14% in cycle counts compared to the linear placement. When compared to an SRAM-based system, although increasing memory usage by 26%, DWM with SHRIMP allows a 73% reduction in memory energy and 42% relative energy delay product. We estimate overall energy reductions of 14%, 15% and 19% in three example embedded systems.publishedVersionPeer reviewe

    AEx: Automated High-Level Synthesis of Compiler Programmable Co-Processors

    Get PDF
    Modern High Level Synthesis (HLS) tools succeed well in their engineering productivity goal, but still require toolset and target technology specific modifications to the source code to guide the process towards an efficient implementation. Furthermore, their end result is a fixed function accelerator with limited field and runtime flexibility. In this paper we describe the status of AEx, a novel work-in-progress HLS tool developed in the FitOptiVis ECSEL JU project. AEx is based on automated exploration of architectures using a flexible and lightweight parallel co-processor template. We compare its current performance in CHStone C-language benchmarks to the state of the art FPGA HLS tool Vitis, provide ASIC implementation numbers, and identify the main remaining toolset features that are expected to dramatically further improve the performance. The potential is explored with a hand-optimized case study that shows only 1.64x performance slowdown with the programmable co-processor in comparison to the fixed function Vitis HLS result.</p

    AEx: Automated High-Level Synthesis of Compiler Programmable Co-Processors

    No full text
    Modern High Level Synthesis (HLS) tools succeed well in their engineering productivity goal, but still require toolset and target technology specific modifications to the source code to guide the process towards an efficient implementation. Furthermore, their end result is a fixed function accelerator with limited field and runtime flexibility. In this paper we describe the status of AEx, a novel work-in-progress HLS tool developed in the FitOptiVis ECSEL JU project. AEx is based on automated exploration of architectures using a flexible and lightweight parallel co-processor template. We compare its current performance in CHStone C-language benchmarks to the state of the art FPGA HLS tool Vitis, provide ASIC implementation numbers, and identify the main remaining toolset features that are expected to dramatically further improve the performance. The potential is explored with a hand-optimized case study that shows only 1.64x performance slowdown with the programmable co-processor in comparison to the fixed function Vitis HLS result.Computer Engineerin
    corecore